Overview

Dataset statistics

Number of variables20
Number of observations122636
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory18.7 MiB
Average record size in memory160.0 B

Variable types

Numeric11
Categorical9

Alerts

id has a high cardinality: 122636 distinct values High cardinality
df_index is highly correlated with year_account_createdHigh correlation
days_from_first_active_until_booking is highly correlated with days_from_account_created_until_first_booking and 2 other fieldsHigh correlation
days_from_account_created_until_first_booking is highly correlated with days_from_first_active_until_booking and 2 other fieldsHigh correlation
day_first_booking is highly correlated with days_from_first_active_until_booking and 2 other fieldsHigh correlation
day_of_week_first_booking is highly correlated with days_from_first_active_until_booking and 2 other fieldsHigh correlation
year_account_created is highly correlated with df_indexHigh correlation
df_index is highly correlated with year_account_createdHigh correlation
days_from_first_active_until_booking is highly correlated with days_from_account_created_until_first_booking and 2 other fieldsHigh correlation
days_from_account_created_until_first_booking is highly correlated with days_from_first_active_until_booking and 2 other fieldsHigh correlation
day_first_booking is highly correlated with days_from_first_active_until_booking and 2 other fieldsHigh correlation
day_of_week_first_booking is highly correlated with days_from_first_active_until_booking and 2 other fieldsHigh correlation
year_account_created is highly correlated with df_indexHigh correlation
df_index is highly correlated with year_account_createdHigh correlation
days_from_first_active_until_booking is highly correlated with days_from_account_created_until_first_booking and 1 other fieldsHigh correlation
days_from_account_created_until_first_booking is highly correlated with days_from_first_active_until_booking and 1 other fieldsHigh correlation
day_first_booking is highly correlated with day_of_week_first_bookingHigh correlation
day_of_week_first_booking is highly correlated with days_from_first_active_until_booking and 2 other fieldsHigh correlation
year_account_created is highly correlated with df_indexHigh correlation
df_index is highly correlated with country_destination and 4 other fieldsHigh correlation
signup_flow is highly correlated with affiliate_channel and 1 other fieldsHigh correlation
affiliate_channel is highly correlated with signup_flow and 2 other fieldsHigh correlation
first_affiliate_tracked is highly correlated with affiliate_channelHigh correlation
signup_app is highly correlated with signup_flow and 1 other fieldsHigh correlation
country_destination is highly correlated with df_index and 2 other fieldsHigh correlation
days_from_first_active_until_booking is highly correlated with df_index and 5 other fieldsHigh correlation
days_from_account_created_until_first_booking is highly correlated with df_index and 4 other fieldsHigh correlation
day_first_booking is highly correlated with days_from_first_active_until_booking and 3 other fieldsHigh correlation
day_of_week_first_booking is highly correlated with days_from_first_active_until_booking and 2 other fieldsHigh correlation
year_account_created is highly correlated with df_index and 4 other fieldsHigh correlation
day_account_created is highly correlated with day_first_bookingHigh correlation
week_of _year_first_account_created is highly correlated with df_index and 3 other fieldsHigh correlation
days_from_first_active_until_account_created is highly skewed (γ1 = 55.3260477) Skewed
id is uniformly distributed Uniform
df_index has unique values Unique
id has unique values Unique
signup_flow has 98317 (80.2%) zeros Zeros
days_from_first_active_until_booking has 14839 (12.1%) zeros Zeros
days_from_first_active_until_account_created has 122482 (99.9%) zeros Zeros
days_from_account_created_until_first_booking has 14842 (12.1%) zeros Zeros
day_of_week_first_booking has 64567 (52.6%) zeros Zeros
day_of _week_first_account_created has 18949 (15.5%) zeros Zeros

Reproduction

Analysis started2022-07-23 12:35:02.694881
Analysis finished2022-07-23 12:36:18.163691
Duration1 minute and 15.47 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct122636
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean104123.6797
Minimum1
Maximum213448
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size958.2 KiB
2022-07-23T09:36:18.725356image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile11367.5
Q150493.5
median100991.5
Q3158162.25
95-th percentile202360.25
Maximum213448
Range213447
Interquartile range (IQR)107668.75

Descriptive statistics

Standard deviation61722.63315
Coefficient of variation (CV)0.5927819042
Kurtosis-1.218367405
Mean104123.6797
Median Absolute Deviation (MAD)53548
Skewness0.07863645755
Sum1.276931159 × 1010
Variance3809683443
MonotonicityStrictly increasing
2022-07-23T09:36:18.917808image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
< 0.1%
1391711
 
< 0.1%
1391971
 
< 0.1%
1391961
 
< 0.1%
1391931
 
< 0.1%
1391921
 
< 0.1%
1391911
 
< 0.1%
1391901
 
< 0.1%
1391881
 
< 0.1%
1391861
 
< 0.1%
Other values (122626)122626
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
101
< 0.1%
111
< 0.1%
ValueCountFrequency (%)
2134481
< 0.1%
2134461
< 0.1%
2134451
< 0.1%
2134431
< 0.1%
2134411
< 0.1%
2134401
< 0.1%
2134391
< 0.1%
2134321
< 0.1%
2134301
< 0.1%
2134251
< 0.1%

id
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct122636
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size958.2 KiB
820tgsjxq7
 
1
upghc731x8
 
1
2ajt2i0cwf
 
1
dpsojmdsqa
 
1
aennz2le8b
 
1
Other values (122631)
122631 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters1226360
Distinct characters36
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique122636 ?
Unique (%)100.0%

Sample

1st row820tgsjxq7
2nd row4ft3gnwmtx
3rd rowbjjt8pjhuk
4th row87mebub9p4
5th rowlsw9q7uk0j

Common Values

ValueCountFrequency (%)
820tgsjxq71
 
< 0.1%
upghc731x81
 
< 0.1%
2ajt2i0cwf1
 
< 0.1%
dpsojmdsqa1
 
< 0.1%
aennz2le8b1
 
< 0.1%
facfeibe4t1
 
< 0.1%
t0hkswek7a1
 
< 0.1%
z329kp10k21
 
< 0.1%
pz9eb42i121
 
< 0.1%
fqdedvewn81
 
< 0.1%
Other values (122626)122626
> 99.9%

Length

2022-07-23T09:36:19.127284image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
820tgsjxq71
 
< 0.1%
7i49vnuav61
 
< 0.1%
lsw9q7uk0j1
 
< 0.1%
0d01nltbrs1
 
< 0.1%
a1vcnhxeij1
 
< 0.1%
6uh8zyj2gn1
 
< 0.1%
yuuqmid2rp1
 
< 0.1%
om1ss59ys81
 
< 0.1%
dy3rgx56cu1
 
< 0.1%
ju3h98ch3w1
 
< 0.1%
Other values (122626)122626
> 99.9%

Most occurring characters

ValueCountFrequency (%)
h34387
 
2.8%
y34362
 
2.8%
l34304
 
2.8%
t34296
 
2.8%
a34266
 
2.8%
k34260
 
2.8%
f34258
 
2.8%
434219
 
2.8%
934152
 
2.8%
m34134
 
2.8%
Other values (26)883722
72.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter885988
72.2%
Decimal Number340372
 
27.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
h34387
 
3.9%
y34362
 
3.9%
l34304
 
3.9%
t34296
 
3.9%
a34266
 
3.9%
k34260
 
3.9%
f34258
 
3.9%
m34134
 
3.9%
x34133
 
3.9%
d34117
 
3.9%
Other values (16)543471
61.3%
Decimal Number
ValueCountFrequency (%)
434219
10.1%
934152
10.0%
334085
10.0%
034054
10.0%
734044
10.0%
534031
10.0%
834026
10.0%
134022
10.0%
234010
10.0%
633729
9.9%

Most occurring scripts

ValueCountFrequency (%)
Latin885988
72.2%
Common340372
 
27.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
h34387
 
3.9%
y34362
 
3.9%
l34304
 
3.9%
t34296
 
3.9%
a34266
 
3.9%
k34260
 
3.9%
f34258
 
3.9%
m34134
 
3.9%
x34133
 
3.9%
d34117
 
3.9%
Other values (16)543471
61.3%
Common
ValueCountFrequency (%)
434219
10.1%
934152
10.0%
334085
10.0%
034054
10.0%
734044
10.0%
534031
10.0%
834026
10.0%
134022
10.0%
234010
10.0%
633729
9.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII1226360
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
h34387
 
2.8%
y34362
 
2.8%
l34304
 
2.8%
t34296
 
2.8%
a34266
 
2.8%
k34260
 
2.8%
f34258
 
2.8%
434219
 
2.8%
934152
 
2.8%
m34134
 
2.8%
Other values (26)883722
72.1%

gender
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size958.2 KiB
FEMALE
56362 
MALE
49484 
-unknown-
16565 
OTHER
 
225

Length

Max length9
Median length6
Mean length5.596382791
Min length4

Characters and Unicode

Total characters686318
Distinct characters15
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMALE
2nd rowFEMALE
3rd rowFEMALE
4th row-unknown-
5th rowFEMALE

Common Values

ValueCountFrequency (%)
FEMALE56362
46.0%
MALE49484
40.4%
-unknown-16565
 
13.5%
OTHER225
 
0.2%

Length

2022-07-23T09:36:19.269904image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-23T09:36:19.513762image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
female56362
46.0%
male49484
40.4%
unknown16565
 
13.5%
other225
 
0.2%

Most occurring characters

ValueCountFrequency (%)
E162433
23.7%
M105846
15.4%
A105846
15.4%
L105846
15.4%
F56362
 
8.2%
n49695
 
7.2%
-33130
 
4.8%
u16565
 
2.4%
k16565
 
2.4%
o16565
 
2.4%
Other values (5)17465
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter537233
78.3%
Lowercase Letter115955
 
16.9%
Dash Punctuation33130
 
4.8%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E162433
30.2%
M105846
19.7%
A105846
19.7%
L105846
19.7%
F56362
 
10.5%
O225
 
< 0.1%
T225
 
< 0.1%
H225
 
< 0.1%
R225
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
n49695
42.9%
u16565
 
14.3%
k16565
 
14.3%
o16565
 
14.3%
w16565
 
14.3%
Dash Punctuation
ValueCountFrequency (%)
-33130
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin653188
95.2%
Common33130
 
4.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
E162433
24.9%
M105846
16.2%
A105846
16.2%
L105846
16.2%
F56362
 
8.6%
n49695
 
7.6%
u16565
 
2.5%
k16565
 
2.5%
o16565
 
2.5%
w16565
 
2.5%
Other values (4)900
 
0.1%
Common
ValueCountFrequency (%)
-33130
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII686318
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
E162433
23.7%
M105846
15.4%
A105846
15.4%
L105846
15.4%
F56362
 
8.2%
n49695
 
7.2%
-33130
 
4.8%
u16565
 
2.4%
k16565
 
2.4%
o16565
 
2.4%
Other values (5)17465
 
2.5%

age
Real number (ℝ≥0)

Distinct99
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.40559053
Minimum16
Maximum115
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size958.2 KiB
2022-07-23T09:36:19.959570image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum16
5-th percentile23
Q128
median34
Q343
95-th percentile63
Maximum115
Range99
Interquartile range (IQR)15

Descriptive statistics

Standard deviation13.93990034
Coefficient of variation (CV)0.3726689016
Kurtosis6.51646808
Mean37.40559053
Median Absolute Deviation (MAD)7
Skewness2.089718287
Sum4587272
Variance194.3208214
MonotonicityNot monotonic
2022-07-23T09:36:20.168978image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
306039
 
4.9%
315935
 
4.8%
295894
 
4.8%
285862
 
4.8%
325763
 
4.7%
275671
 
4.6%
335455
 
4.4%
264960
 
4.0%
344940
 
4.0%
354777
 
3.9%
Other values (89)67340
54.9%
ValueCountFrequency (%)
1626
 
< 0.1%
1764
 
0.1%
18665
 
0.5%
191097
 
0.9%
20533
 
0.4%
21969
 
0.8%
221679
 
1.4%
232424
2.0%
243173
2.6%
254405
3.6%
ValueCountFrequency (%)
11512
 
< 0.1%
1134
 
< 0.1%
1121
 
< 0.1%
1112
 
< 0.1%
110188
 
0.2%
10931
 
< 0.1%
10815
 
< 0.1%
10723
 
< 0.1%
10617
 
< 0.1%
1051127
0.9%

signup_method
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size958.2 KiB
basic
66039 
facebook
56456 
google
 
141

Length

Max length8
Median length5
Mean length6.382212401
Min length5

Characters and Unicode

Total characters782689
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowfacebook
2nd rowbasic
3rd rowfacebook
4th rowbasic
5th rowbasic

Common Values

ValueCountFrequency (%)
basic66039
53.8%
facebook56456
46.0%
google141
 
0.1%

Length

2022-07-23T09:36:20.365788image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-23T09:36:20.547270image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
basic66039
53.8%
facebook56456
46.0%
google141
 
0.1%

Most occurring characters

ValueCountFrequency (%)
b122495
15.7%
a122495
15.7%
c122495
15.7%
o113194
14.5%
s66039
8.4%
i66039
8.4%
e56597
7.2%
f56456
7.2%
k56456
7.2%
g282
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter782689
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
b122495
15.7%
a122495
15.7%
c122495
15.7%
o113194
14.5%
s66039
8.4%
i66039
8.4%
e56597
7.2%
f56456
7.2%
k56456
7.2%
g282
 
< 0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin782689
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
b122495
15.7%
a122495
15.7%
c122495
15.7%
o113194
14.5%
s66039
8.4%
i66039
8.4%
e56597
7.2%
f56456
7.2%
k56456
7.2%
g282
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII782689
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
b122495
15.7%
a122495
15.7%
c122495
15.7%
o113194
14.5%
s66039
8.4%
i66039
8.4%
e56597
7.2%
f56456
7.2%
k56456
7.2%
g282
 
< 0.1%

signup_flow
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.51951303
Minimum0
Maximum25
Zeros98317
Zeros (%)80.2%
Negative0
Negative (%)0.0%
Memory size958.2 KiB
2022-07-23T09:36:20.669981image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile24
Maximum25
Range25
Interquartile range (IQR)0

Descriptive statistics

Standard deviation6.604722668
Coefficient of variation (CV)2.621428263
Kurtosis5.928297912
Mean2.51951303
Median Absolute Deviation (MAD)0
Skewness2.705870564
Sum308983
Variance43.62236152
MonotonicityNot monotonic
2022-07-23T09:36:20.812145image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=17)
ValueCountFrequency (%)
098317
80.2%
125996
 
4.9%
255845
 
4.8%
35035
 
4.1%
23823
 
3.1%
241596
 
1.3%
23993
 
0.8%
1509
 
0.4%
21195
 
0.2%
8142
 
0.1%
Other values (7)185
 
0.2%
ValueCountFrequency (%)
098317
80.2%
1509
 
0.4%
23823
 
3.1%
35035
 
4.1%
41
 
< 0.1%
527
 
< 0.1%
6139
 
0.1%
8142
 
0.1%
101
 
< 0.1%
125996
 
4.9%
ValueCountFrequency (%)
255845
4.8%
241596
 
1.3%
23993
 
0.8%
21195
 
0.2%
205
 
< 0.1%
169
 
< 0.1%
153
 
< 0.1%
125996
4.9%
101
 
< 0.1%
8142
 
0.1%

language
Categorical

Distinct25
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size958.2 KiB
en
118205 
zh
 
901
fr
 
807
es
 
625
de
 
407
Other values (20)
 
1691

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters245272
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen

Common Values

ValueCountFrequency (%)
en118205
96.4%
zh901
 
0.7%
fr807
 
0.7%
es625
 
0.5%
de407
 
0.3%
ko395
 
0.3%
it347
 
0.3%
ru269
 
0.2%
pt169
 
0.1%
ja130
 
0.1%
Other values (15)381
 
0.3%

Length

2022-07-23T09:36:20.950773image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
en118205
96.4%
zh901
 
0.7%
fr807
 
0.7%
es625
 
0.5%
de407
 
0.3%
ko395
 
0.3%
it347
 
0.3%
ru269
 
0.2%
pt169
 
0.1%
ja130
 
0.1%
Other values (15)381
 
0.3%

Most occurring characters

ValueCountFrequency (%)
e119259
48.6%
n118277
48.2%
r1123
 
0.5%
h935
 
0.4%
z901
 
0.4%
f818
 
0.3%
s726
 
0.3%
t578
 
0.2%
d457
 
0.2%
o415
 
0.2%
Other values (9)1783
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter245272
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e119259
48.6%
n118277
48.2%
r1123
 
0.5%
h935
 
0.4%
z901
 
0.4%
f818
 
0.3%
s726
 
0.3%
t578
 
0.2%
d457
 
0.2%
o415
 
0.2%
Other values (9)1783
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Latin245272
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e119259
48.6%
n118277
48.2%
r1123
 
0.5%
h935
 
0.4%
z901
 
0.4%
f818
 
0.3%
s726
 
0.3%
t578
 
0.2%
d457
 
0.2%
o415
 
0.2%
Other values (9)1783
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII245272
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e119259
48.6%
n118277
48.2%
r1123
 
0.5%
h935
 
0.4%
z901
 
0.4%
f818
 
0.3%
s726
 
0.3%
t578
 
0.2%
d457
 
0.2%
o415
 
0.2%
Other values (9)1783
 
0.7%

affiliate_channel
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size958.2 KiB
direct
79093 
sem-brand
15347 
sem-non-brand
9677 
other
 
5357
seo
 
5288
Other values (3)
 
7874

Length

Max length13
Median length6
Mean length6.666109462
Min length3

Characters and Unicode

Total characters817505
Distinct characters17
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowseo
2nd rowdirect
3rd rowdirect
4th rowdirect
5th rowother

Common Values

ValueCountFrequency (%)
direct79093
64.5%
sem-brand15347
 
12.5%
sem-non-brand9677
 
7.9%
other5357
 
4.4%
seo5288
 
4.3%
api5280
 
4.3%
content2000
 
1.6%
remarketing594
 
0.5%

Length

2022-07-23T09:36:21.104361image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-23T09:36:21.301833image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
direct79093
64.5%
sem-brand15347
 
12.5%
sem-non-brand9677
 
7.9%
other5357
 
4.4%
seo5288
 
4.3%
api5280
 
4.3%
content2000
 
1.6%
remarketing594
 
0.5%

Most occurring characters

ValueCountFrequency (%)
e117950
14.4%
r110662
13.5%
d104117
12.7%
t89044
10.9%
i84967
10.4%
c81093
9.9%
n48972
6.0%
-34701
 
4.2%
a30898
 
3.8%
s30312
 
3.7%
Other values (7)84789
10.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter782804
95.8%
Dash Punctuation34701
 
4.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e117950
15.1%
r110662
14.1%
d104117
13.3%
t89044
11.4%
i84967
10.9%
c81093
10.4%
n48972
6.3%
a30898
 
3.9%
s30312
 
3.9%
m25618
 
3.3%
Other values (6)59171
7.6%
Dash Punctuation
ValueCountFrequency (%)
-34701
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin782804
95.8%
Common34701
 
4.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e117950
15.1%
r110662
14.1%
d104117
13.3%
t89044
11.4%
i84967
10.9%
c81093
10.4%
n48972
6.3%
a30898
 
3.9%
s30312
 
3.9%
m25618
 
3.3%
Other values (6)59171
7.6%
Common
ValueCountFrequency (%)
-34701
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII817505
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e117950
14.4%
r110662
13.5%
d104117
12.7%
t89044
10.9%
i84967
10.4%
c81093
9.9%
n48972
6.0%
-34701
 
4.2%
a30898
 
3.8%
s30312
 
3.7%
Other values (7)84789
10.4%

first_affiliate_tracked
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size958.2 KiB
untracked
64712 
linked
28284 
omg
24865 
tracked-other
 
3834
product
 
813
Other values (2)
 
128

Length

Max length13
Median length9
Mean length7.203366059
Min length3

Characters and Unicode

Total characters883392
Distinct characters19
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowuntracked
2nd rowuntracked
3rd rowuntracked
4th rowuntracked
5th rowuntracked

Common Values

ValueCountFrequency (%)
untracked64712
52.8%
linked28284
23.1%
omg24865
 
20.3%
tracked-other3834
 
3.1%
product813
 
0.7%
marketing101
 
0.1%
local ops27
 
< 0.1%

Length

2022-07-23T09:36:21.493330image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-23T09:36:21.735160image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
untracked64712
52.8%
linked28284
23.1%
omg24865
 
20.3%
tracked-other3834
 
3.1%
product813
 
0.7%
marketing101
 
0.1%
local27
 
< 0.1%
ops27
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e100765
11.4%
d97643
11.1%
k96931
11.0%
n93097
10.5%
t73294
8.3%
r73294
8.3%
c69386
7.9%
a68674
7.8%
u65525
7.4%
o29566
 
3.3%
Other values (9)115217
13.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter879531
99.6%
Dash Punctuation3834
 
0.4%
Space Separator27
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e100765
11.5%
d97643
11.1%
k96931
11.0%
n93097
10.6%
t73294
8.3%
r73294
8.3%
c69386
7.9%
a68674
7.8%
u65525
7.4%
o29566
 
3.4%
Other values (7)111356
12.7%
Dash Punctuation
ValueCountFrequency (%)
-3834
100.0%
Space Separator
ValueCountFrequency (%)
27
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin879531
99.6%
Common3861
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e100765
11.5%
d97643
11.1%
k96931
11.0%
n93097
10.6%
t73294
8.3%
r73294
8.3%
c69386
7.9%
a68674
7.8%
u65525
7.4%
o29566
 
3.4%
Other values (7)111356
12.7%
Common
ValueCountFrequency (%)
-3834
99.3%
27
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII883392
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e100765
11.4%
d97643
11.1%
k96931
11.0%
n93097
10.5%
t73294
8.3%
r73294
8.3%
c69386
7.9%
a68674
7.8%
u65525
7.4%
o29566
 
3.3%
Other values (9)115217
13.0%

signup_app
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size958.2 KiB
Web
108283 
iOS
 
9689
Moweb
 
2364
Android
 
2300

Length

Max length7
Median length3
Mean length3.113571871
Min length3

Characters and Unicode

Total characters381836
Distinct characters13
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWeb
2nd rowWeb
3rd rowWeb
4th rowWeb
5th rowWeb

Common Values

ValueCountFrequency (%)
Web108283
88.3%
iOS9689
 
7.9%
Moweb2364
 
1.9%
Android2300
 
1.9%

Length

2022-07-23T09:36:21.973885image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-23T09:36:22.198862image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
web108283
88.3%
ios9689
 
7.9%
moweb2364
 
1.9%
android2300
 
1.9%

Most occurring characters

ValueCountFrequency (%)
e110647
29.0%
b110647
29.0%
W108283
28.4%
i11989
 
3.1%
O9689
 
2.5%
S9689
 
2.5%
o4664
 
1.2%
d4600
 
1.2%
M2364
 
0.6%
w2364
 
0.6%
Other values (3)6900
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter249511
65.3%
Uppercase Letter132325
34.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e110647
44.3%
b110647
44.3%
i11989
 
4.8%
o4664
 
1.9%
d4600
 
1.8%
w2364
 
0.9%
n2300
 
0.9%
r2300
 
0.9%
Uppercase Letter
ValueCountFrequency (%)
W108283
81.8%
O9689
 
7.3%
S9689
 
7.3%
M2364
 
1.8%
A2300
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
Latin381836
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e110647
29.0%
b110647
29.0%
W108283
28.4%
i11989
 
3.1%
O9689
 
2.5%
S9689
 
2.5%
o4664
 
1.2%
d4600
 
1.2%
M2364
 
0.6%
w2364
 
0.6%
Other values (3)6900
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII381836
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e110647
29.0%
b110647
29.0%
W108283
28.4%
i11989
 
3.1%
O9689
 
2.5%
S9689
 
2.5%
o4664
 
1.2%
d4600
 
1.2%
M2364
 
0.6%
w2364
 
0.6%
Other values (3)6900
 
1.8%

country_destination
Categorical

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size958.2 KiB
NDF
32555 
US
28208 
CA
23632 
AU
22372 
DE
5696 
Other values (7)
10173 

Length

Max length5
Median length2
Mean length2.372411038
Min length2

Characters and Unicode

Total characters290943
Distinct characters20
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUS
2nd rowother
3rd rowUS
4th rowUS
5th rowUS

Common Values

ValueCountFrequency (%)
NDF32555
26.5%
US28208
23.0%
CA23632
19.3%
AU22372
18.2%
DE5696
 
4.6%
other4372
 
3.6%
FR2130
 
1.7%
IT1228
 
1.0%
GB1019
 
0.8%
ES992
 
0.8%
Other values (2)432
 
0.4%

Length

2022-07-23T09:36:22.572385image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ndf32555
26.5%
us28208
23.0%
ca23632
19.3%
au22372
18.2%
de5696
 
4.6%
other4372
 
3.6%
fr2130
 
1.7%
it1228
 
1.0%
gb1019
 
0.8%
es992
 
0.8%
Other values (2)432
 
0.4%

Most occurring characters

ValueCountFrequency (%)
U50580
17.4%
A46004
15.8%
D38251
13.1%
F34685
11.9%
N32896
11.3%
S29200
10.0%
C23632
8.1%
E6688
 
2.3%
r4372
 
1.5%
e4372
 
1.5%
Other values (10)20263
7.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter269083
92.5%
Lowercase Letter21860
 
7.5%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
U50580
18.8%
A46004
17.1%
D38251
14.2%
F34685
12.9%
N32896
12.2%
S29200
10.9%
C23632
8.8%
E6688
 
2.5%
R2130
 
0.8%
T1319
 
0.5%
Other values (5)3698
 
1.4%
Lowercase Letter
ValueCountFrequency (%)
r4372
20.0%
e4372
20.0%
h4372
20.0%
t4372
20.0%
o4372
20.0%

Most occurring scripts

ValueCountFrequency (%)
Latin290943
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
U50580
17.4%
A46004
15.8%
D38251
13.1%
F34685
11.9%
N32896
11.3%
S29200
10.0%
C23632
8.1%
E6688
 
2.3%
r4372
 
1.5%
e4372
 
1.5%
Other values (10)20263
7.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII290943
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
U50580
17.4%
A46004
15.8%
D38251
13.1%
F34685
11.9%
N32896
11.3%
S29200
10.0%
C23632
8.1%
E6688
 
2.3%
r4372
 
1.5%
e4372
 
1.5%
Other values (10)20263
7.0%

days_from_first_active_until_booking
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct1853
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean356.2222104
Minimum0
Maximum2228
Zeros14839
Zeros (%)12.1%
Negative0
Negative (%)0.0%
Memory size958.2 KiB
2022-07-23T09:36:22.765381image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median222
Q3625
95-th percentile1124
Maximum2228
Range2228
Interquartile range (IQR)622

Descriptive statistics

Standard deviation398.6656183
Coefficient of variation (CV)1.119148685
Kurtosis0.02666812209
Mean356.2222104
Median Absolute Deviation (MAD)221
Skewness0.9379958696
Sum43685667
Variance158934.2752
MonotonicityNot monotonic
2022-07-23T09:36:22.976815image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
014839
 
12.1%
110592
 
8.6%
24795
 
3.9%
32956
 
2.4%
42183
 
1.8%
51694
 
1.4%
61328
 
1.1%
71237
 
1.0%
8992
 
0.8%
9812
 
0.7%
Other values (1843)81208
66.2%
ValueCountFrequency (%)
014839
12.1%
110592
8.6%
24795
 
3.9%
32956
 
2.4%
42183
 
1.8%
51694
 
1.4%
61328
 
1.1%
71237
 
1.0%
8992
 
0.8%
9812
 
0.7%
ValueCountFrequency (%)
22281
< 0.1%
20012
< 0.1%
19991
< 0.1%
19951
< 0.1%
19921
< 0.1%
19912
< 0.1%
19902
< 0.1%
19801
< 0.1%
19791
< 0.1%
19771
< 0.1%

days_from_first_active_until_account_created
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct132
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3723947291
Minimum0
Maximum1456
Zeros122482
Zeros (%)99.9%
Negative0
Negative (%)0.0%
Memory size958.2 KiB
2022-07-23T09:36:23.260057image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum1456
Range1456
Interquartile range (IQR)0

Descriptive statistics

Standard deviation15.17748441
Coefficient of variation (CV)40.75644262
Kurtosis3655.645933
Mean0.3723947291
Median Absolute Deviation (MAD)0
Skewness55.3260477
Sum45669
Variance230.356033
MonotonicityNot monotonic
2022-07-23T09:36:23.509391image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0122482
99.9%
14
 
< 0.1%
64
 
< 0.1%
33
 
< 0.1%
293
 
< 0.1%
6342
 
< 0.1%
1032
 
< 0.1%
42
 
< 0.1%
202
 
< 0.1%
7222
 
< 0.1%
Other values (122)130
 
0.1%
ValueCountFrequency (%)
0122482
99.9%
14
 
< 0.1%
21
 
< 0.1%
33
 
< 0.1%
42
 
< 0.1%
52
 
< 0.1%
64
 
< 0.1%
71
 
< 0.1%
92
 
< 0.1%
101
 
< 0.1%
ValueCountFrequency (%)
14561
< 0.1%
13691
< 0.1%
13611
< 0.1%
11481
< 0.1%
10361
< 0.1%
10181
< 0.1%
10111
< 0.1%
9951
< 0.1%
8821
< 0.1%
8511
< 0.1%

days_from_account_created_until_first_booking
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct1873
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean355.8498157
Minimum-349
Maximum2001
Zeros14842
Zeros (%)12.1%
Negative25
Negative (%)< 0.1%
Memory size958.2 KiB
2022-07-23T09:36:23.779667image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum-349
5-th percentile0
Q13
median221
Q3624
95-th percentile1123
Maximum2001
Range2350
Interquartile range (IQR)621

Descriptive statistics

Standard deviation398.4638995
Coefficient of variation (CV)1.119753002
Kurtosis0.02362103566
Mean355.8498157
Median Absolute Deviation (MAD)220
Skewness0.9374490797
Sum43639998
Variance158773.4792
MonotonicityNot monotonic
2022-07-23T09:36:24.089940image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
014842
 
12.1%
110592
 
8.6%
24799
 
3.9%
32958
 
2.4%
42183
 
1.8%
51698
 
1.4%
61331
 
1.1%
71237
 
1.0%
8993
 
0.8%
9810
 
0.7%
Other values (1863)81193
66.2%
ValueCountFrequency (%)
-3491
< 0.1%
-3471
< 0.1%
-3381
< 0.1%
-3081
< 0.1%
-2981
< 0.1%
-2951
< 0.1%
-2691
< 0.1%
-2611
< 0.1%
-2081
< 0.1%
-1401
< 0.1%
ValueCountFrequency (%)
20012
< 0.1%
19991
< 0.1%
19951
< 0.1%
19921
< 0.1%
19912
< 0.1%
19902
< 0.1%
19801
< 0.1%
19791
< 0.1%
19771
< 0.1%
19761
< 0.1%

day_first_booking
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean21.61555334
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size958.2 KiB
2022-07-23T09:36:24.312344image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q114
median28
Q329
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation9.283882799
Coefficient of variation (CV)0.429500122
Kurtosis-0.754697507
Mean21.61555334
Median Absolute Deviation (MAD)2
Skewness-0.853826875
Sum2650845
Variance86.19047983
MonotonicityNot monotonic
2022-07-23T09:36:24.451127image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
2956857
46.4%
162344
 
1.9%
172328
 
1.9%
112321
 
1.9%
102318
 
1.9%
132313
 
1.9%
152300
 
1.9%
32291
 
1.9%
52284
 
1.9%
122280
 
1.9%
Other values (21)45000
36.7%
ValueCountFrequency (%)
12104
1.7%
22189
1.8%
32291
1.9%
42161
1.8%
52284
1.9%
62228
1.8%
72259
1.8%
82272
1.9%
92212
1.8%
102318
1.9%
ValueCountFrequency (%)
311194
 
1.0%
302055
 
1.7%
2956857
46.4%
282157
 
1.8%
272098
 
1.7%
262149
 
1.8%
252195
 
1.8%
242219
 
1.8%
232212
 
1.8%
222234
 
1.8%

day_of_week_first_booking
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.549520532
Minimum0
Maximum6
Zeros64567
Zeros (%)52.6%
Negative0
Negative (%)0.0%
Memory size958.2 KiB
2022-07-23T09:36:24.596742image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q33
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.992697276
Coefficient of variation (CV)1.286008952
Kurtosis-0.4927185682
Mean1.549520532
Median Absolute Deviation (MAD)0
Skewness0.9556386108
Sum190027
Variance3.970842433
MonotonicityNot monotonic
2022-07-23T09:36:24.990689image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
064567
52.6%
110913
 
8.9%
210909
 
8.9%
310610
 
8.7%
410180
 
8.3%
57996
 
6.5%
67461
 
6.1%
ValueCountFrequency (%)
064567
52.6%
110913
 
8.9%
210909
 
8.9%
310610
 
8.7%
410180
 
8.3%
57996
 
6.5%
67461
 
6.1%
ValueCountFrequency (%)
67461
 
6.1%
57996
 
6.5%
410180
 
8.3%
310610
 
8.7%
210909
 
8.9%
110913
 
8.9%
064567
52.6%

year_account_created
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size958.2 KiB
2013
47619 
2014
42059 
2012
24820 
2011
6743 
2010
 
1395

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters490544
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2011
2nd row2010
3rd row2011
4th row2010
5th row2010

Common Values

ValueCountFrequency (%)
201347619
38.8%
201442059
34.3%
201224820
20.2%
20116743
 
5.5%
20101395
 
1.1%

Length

2022-07-23T09:36:25.178186image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-07-23T09:36:25.330778image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
201347619
38.8%
201442059
34.3%
201224820
20.2%
20116743
 
5.5%
20101395
 
1.1%

Most occurring characters

ValueCountFrequency (%)
2147456
30.1%
1129379
26.4%
0124031
25.3%
347619
 
9.7%
442059
 
8.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number490544
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2147456
30.1%
1129379
26.4%
0124031
25.3%
347619
 
9.7%
442059
 
8.6%

Most occurring scripts

ValueCountFrequency (%)
Common490544
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2147456
30.1%
1129379
26.4%
0124031
25.3%
347619
 
9.7%
442059
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII490544
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2147456
30.1%
1129379
26.4%
0124031
25.3%
347619
 
9.7%
442059
 
8.6%

day_account_created
Real number (ℝ≥0)

HIGH CORRELATION

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.8485355
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size958.2 KiB
2022-07-23T09:36:25.493177image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q18
median16
Q323
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.747581973
Coefficient of variation (CV)0.5519489149
Kurtosis-1.189826952
Mean15.8485355
Median Absolute Deviation (MAD)8
Skewness-0.005988955419
Sum1943601
Variance76.52019038
MonotonicityNot monotonic
2022-07-23T09:36:25.647764image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
244191
 
3.4%
164154
 
3.4%
234140
 
3.4%
204131
 
3.4%
184129
 
3.4%
284128
 
3.4%
274116
 
3.4%
124104
 
3.3%
114095
 
3.3%
104091
 
3.3%
Other values (21)81357
66.3%
ValueCountFrequency (%)
13510
2.9%
23980
3.2%
33977
3.2%
43972
3.2%
54045
3.3%
64039
3.3%
73886
3.2%
83971
3.2%
94057
3.3%
104091
3.3%
ValueCountFrequency (%)
312166
1.8%
303845
3.1%
293809
3.1%
284128
3.4%
274116
3.4%
264023
3.3%
253942
3.2%
244191
3.4%
234140
3.4%
224035
3.3%

day_of _week_first_account_created
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.756841384
Minimum0
Maximum6
Zeros18949
Zeros (%)15.5%
Negative0
Negative (%)0.0%
Memory size958.2 KiB
2022-07-23T09:36:25.784701image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q34
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.943418264
Coefficient of variation (CV)0.704943808
Kurtosis-1.148452726
Mean2.756841384
Median Absolute Deviation (MAD)2
Skewness0.1708455936
Sum338088
Variance3.776874547
MonotonicityNot monotonic
2022-07-23T09:36:25.906918image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
120245
16.5%
219662
16.0%
018949
15.5%
318632
15.2%
417119
14.0%
514027
11.4%
614002
11.4%
ValueCountFrequency (%)
018949
15.5%
120245
16.5%
219662
16.0%
318632
15.2%
417119
14.0%
514027
11.4%
614002
11.4%
ValueCountFrequency (%)
614002
11.4%
514027
11.4%
417119
14.0%
318632
15.2%
219662
16.0%
120245
16.5%
018949
15.5%

week_of _year_first_account_created
Real number (ℝ≥0)

HIGH CORRELATION

Distinct53
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24.38924949
Minimum1
Maximum53
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size958.2 KiB
2022-07-23T09:36:26.074177image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q113
median23
Q336
95-th percentile49
Maximum53
Range52
Interquartile range (IQR)23

Descriptive statistics

Standard deviation14.0251999
Coefficient of variation (CV)0.5750566415
Kurtosis-0.9463708598
Mean24.38924949
Median Absolute Deviation (MAD)11
Skewness0.2578035502
Sum2991000
Variance196.7062322
MonotonicityNot monotonic
2022-07-23T09:36:26.270897image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
263982
 
3.2%
253691
 
3.0%
213635
 
3.0%
243622
 
3.0%
233584
 
2.9%
203492
 
2.8%
223331
 
2.7%
183247
 
2.6%
193245
 
2.6%
173241
 
2.6%
Other values (43)87566
71.4%
ValueCountFrequency (%)
11919
1.6%
22286
1.9%
32451
2.0%
42243
1.8%
52286
1.9%
62375
1.9%
72282
1.9%
82375
1.9%
92552
2.1%
102514
2.0%
ValueCountFrequency (%)
532
 
< 0.1%
521655
1.3%
511714
1.4%
501754
1.4%
491973
1.6%
481739
1.4%
471765
1.4%
461858
1.5%
451827
1.5%
441631
1.3%

Interactions

2022-07-23T09:36:14.322895image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:49.649915image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:53.302673image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:55.740052image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:57.803459image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:59.952033image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:02.441795image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:04.539159image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:07.440611image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:09.536619image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:11.705590image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:14.504919image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:50.342437image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:53.634786image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:55.902592image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:57.976172image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:00.146075image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:02.638819image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:04.783481image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:07.689944image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:09.710173image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:12.168353image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:14.697984image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:50.609915image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:53.895095image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:56.065245image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:58.152699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:00.341516image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:02.817347image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:04.987972image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:07.866478image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:09.906648image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:12.375832image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:14.900409image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:50.874713image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:54.096555image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:56.231311image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:58.335211image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:00.521369image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:03.003850image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:05.265192image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:08.050984image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:10.180914image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:12.563297image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:15.085912image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:51.155501image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:54.294029image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:56.402852image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:58.512770image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:00.707509image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:03.190672image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:05.554417image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:08.232501image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:10.391351image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:12.754785image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:15.277425image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:51.475642image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:54.484606image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:56.582402image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:58.705225image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:00.912080image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:03.388114image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:05.830681image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:08.422990image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:10.587827image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:12.955853image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:15.464960image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:51.740933image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:54.677717image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:56.756938image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:58.891721image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:01.097125image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:03.570717image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:06.112479image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:08.603543image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:10.788967image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:13.141328image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:15.666389image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:52.237505image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:55.010968image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:56.944653image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:59.090787image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:01.293300image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:03.754259image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:06.380744image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:08.795028image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:11.010595image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:13.489943image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:15.846903image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:52.556669image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:55.202455image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:57.189999image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:59.286286image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:01.744652image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:03.939731image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:06.645550image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:08.980507image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:11.194808image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:13.752786image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:16.041385image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:52.822957image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:55.394974image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:57.427948image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:59.526437image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:01.952139image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:04.137205image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:06.903860image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:09.170995image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:11.364359image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:13.939256image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:16.226988image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:53.080269image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:55.569474image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:57.611008image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:35:59.762506image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:02.193458image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:04.327693image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:07.182115image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:09.357418image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:11.534894image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-07-23T09:36:14.128414image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-07-23T09:36:26.481054image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-07-23T09:36:26.853728image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-07-23T09:36:27.236705image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-07-23T09:36:27.519946image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-07-23T09:36:27.748874image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-07-23T09:36:16.664694image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-07-23T09:36:17.491082image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexidgenderagesignup_methodsignup_flowlanguageaffiliate_channelfirst_affiliate_trackedsignup_appcountry_destinationdays_from_first_active_until_bookingdays_from_first_active_until_account_createddays_from_account_created_until_first_bookingday_first_bookingday_of_week_first_bookingyear_account_createdday_account_createdday_of _week_first_account_createdweek_of _year_first_account_created
01820tgsjxq7MALE38facebook0enseountrackedWebUS22287321496290201125221
124ft3gnwmtxFEMALE56basic3endirectuntrackedWebother419476-5720201028139
23bjjt8pjhukFEMALE42facebook0endirectuntrackedWebUS10437652788520115049
3487mebub9p4-unknown-41basic0endirectuntrackedWebUS72280-208183201014137
46lsw9q7uk0jFEMALE46basic0enotheruntrackedWebUS3035120102553
570d01nltbrsFEMALE47basic0endirectomgWebUS1001013220103653
68a1vcnhxeijFEMALE50basic0enotheruntrackedWebUS20602062932010401
796uh8zyj2gn-unknown-46basic0enotheromgWebNDF000402010401
810yuuqmid2rpFEMALE36basic0enotheruntrackedWebNDF202622010401
911om1ss59ys8FEMALE47basic0enotheruntrackedWebNDF2001020012902010511

Last rows

df_indexidgenderagesignup_methodsignup_flowlanguageaffiliate_channelfirst_affiliate_trackedsignup_appcountry_destinationdays_from_first_active_until_bookingdays_from_first_active_until_account_createddays_from_account_created_until_first_bookingday_first_bookingday_of_week_first_bookingyear_account_createdday_account_createdday_of _week_first_account_createdweek_of _year_first_account_created
122626213425l1f71f9vsjFEMALE30facebook0endirectlinkedWebDE3640364290201430027
12262721343079wk7k2k5t-unknown-19basic0endirectlinkedWebDE3640364290201430027
122628213432rg7ayg1tobMALE31facebook0endirecttracked-otherWebDE3640364290201430027
122629213439msucfwmlzcMALE43basic0endirectuntrackedWebDE2590259160201430027
12263021344004y8115avmFEMALE24basic25endirectuntrackediOSDE3640364290201430027
122631213441omlc9iku7tFEMALE34basic0endirectlinkedWebDE44044132201430027
1226322134430k26r3mir0FEMALE36basic0ensem-brandlinkedWebDE13013136201430027
122633213445qbxza0xojfFEMALE23basic0ensem-brandomgWebDE20222201430027
122634213446zxodksqpepMALE32basic0ensem-brandomgWebDE3640364290201430027
1226352134486o3arsjbb4-unknown-32basic0endirectuntrackedWebDE3640364290201430027